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ABSTRACT 

In the absence of complex astrophysical processes that characterize the reionization era, the 
21 -cm emission from neutral hydrogen (HI) in the post-reionization epoch is believed to be 
an excellent tracer of the underlying dark matter distribution. Assuming a background cos- 
mology, it is modelled through (i) a bias function 6(fc, z), which relates HI to the dark matter 
distribution and (ii) a mean neutral fraction (xhi) which sets its amplitude. In this paper, we 
investigate the nature of large scale HI bias. The post-reionization HI is modelled using grav- 
ity only N-Body simulations and a suitable prescription for assigning gas to the dark matter 
halos. Using the simulated bias as the fiducial model for HI distribution at z < 4, we have 
generated a hypothetical data set for the 21 -cm angular power spectrum (C^) using a noise 
model based on parameters of an extended version of the GMRT. The binned is assumed 
to be measured with SNR > 4 in the range 400 <t< 8000 at a fiducial redshift z = 2.5. We 
explore the possibility of constraining b{k) using the Principal Component Analysis (PCA) 
on this simulated data. Our analysis shows that in the range 0.2 < fc < 2 Mpc^^, the simu- 
lated data set cannot distinguish between models exhibiting different k dependences, provided 
1 ^ ^(^) ^ 2 which sets the 2-cr limits. This justifies the use of linear bias model on large 
scales. The largely uncertain xm is treated as a free parameter resulting in degradation of 
the bias reconstruction. The given simulated data is found to constrain the fiducial xhi with 
an accuracy of ^ 4% {2-cr error). The method outlined here, could be successfully imple- 
mented on future observational data sets to constrain 6(fc, z) and xhi and thereby enhance our 
understanding of the low redshift Universe. 

Key words: cosmology: theory - large-scale structure of Universe - cosmology: diffuse ra- 
diation 



1 INTRODUCTION 

Following the epoch of reionization {z ~ 6), the low den- 
sity gas gets completely ionized jBecker, Fan & White 200 1"| 
[Fan, Carilli & Keating 2006 ). However, a small fraction of neutral 
hydrogen (HI) survives, and is confined to the over-dense regions 
of the IGM. At this redshifts the bulk of the neutral gas is contained 
in clouds with column density greater than 2 x 10^" atoms/cm^ . Ob- 
servations indicate that these regions can be identified as Damped 
Ly-a (DLA) systems (Wolfe, Gawiser & Prochaska 2005 1, which 
are self-shielded from further ionization and house ~ 80% of the 
HI at 1 < z < 4. In this redshift range the neutral fraction remains 
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constant with fini ~ 0.001 jLanzetta, Wolfe & Turnshek 1995| 
IStorrie-Lombardi, McMahon & Irwi n 1996| 
^ao & Turnshek 2000 ; Peroux et al. 2003^. 

The distribution and clustering properties of DLAs suggest 
that they are associated with galaxies, which represent highly non- 
linear matter over densities jHaehnelt, Steinmetz & Rauch 2000| . 
These clumped HI regions saturate the Guim-Peterson optical 
depth l IGunn & Peterson 19651 1 and hence cannot be probed 
using Ly-a absorption. They are, however the dominant source 
for the 21 -cm radiation. In the post reionization epoch, Ly-a 
scattering and the Wou thuysen-Field couphng jWouthuysen 1952) 
Purcell & Field 19561 [Furlanetto, Oh & Briggs 2006| l increases 
the population of the hyperfine triplet state of HI. This makes 
the spin temperature Ts much greater than the CMB tem- 
perature T~i, whereby the 21 -cm radiation is seen in emis- 
sion jMadau, Meiksin & Rees 1997[ [Bharadwaj & Ali 2004[ 
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|Loeb & Zaldarriaga 2004) . The 21 -cm flux from individual HI 
clouds is too weak (< 10/iJy) for detection in radio observations 
with existing facilities, unless the effect of gravitational lensing by 
intervening matter enhances the image of the clouds significantly 
( |Saini, Bharadwaj & Sethi 2001) . The redshifted 21 -cm signal 
however forms a diffuse background in all radio observations at 
z < & (frequencies > 203 MHz). Several radio telescopes, like 
the presently functioning GMRlQ, and future instruments MWjA0 
and aim to detect this weak cosmological signal submerged 

in large astrophysica l foregrounds ^Santos, Cooray & Knox 2005] 
IMcQuinn et al. 2006[|Ali, Bharadwaj & Chengalur 2008) . 

The study of large scale structures in redshift surveys and 
numerical simulations reveal that the galaxies (for that matter any 
non linear structure) trace the underlying dark matter distribution 
with a possible bias dMo & White 19961 IDekel & Lahav 19991 1. 
Associating the post-reionization HI with dark matter halos 
implies that the gas traces the underlying dark matter distribu- 
tion with a possible bias function b{k) = [PYii{k) / P{k)]^^^ , 
where PHi(fc) and P(k) denote the power spectra of HI and 
dark matter density fluctuations respectively. This function is 
believed to quantify the clustering property of the neutral gas. It 
is believed that on small scales (below the Jean's length), the bias 
is a scale dependent function. However, it is reasonably scale- 
independent on large scales ( [Fang et al. 1993| l. Further, the bias 
depends on the redshift. The use of the post-reionization 21 -cm 
signal (Bharadwaj & Sethi 2001 , Bharadwaj, Nath & Sethi 2001 '; 
^the & Loeb 2007 , Loeb & Wyithe 2008 . Wyithe & Loeb 2008, 
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opens up new avenues towards various 
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model is crucial while forecasting or interpreting some of these 
results. 

In this paper we have investigated the nature of HI bias in 
the post-reionization epoch. The HI fluctuations are simulated at 
redshifts z < Q and HI bias is obtained at various redshifts 
from the simulated dark matter and HI power spectra. This is 
similar to the earlier work by Bagla, Khandai & Datta (2010) and 
[Marfn et al. (2010) . The simulated bias function is assumed to be 
our fiducial model for HI distribution at low redshifts. We have 
studied the feasibility of constraining this fiducial model with ob- 
served data. Here we have focused on the multi frequency angular 
power spectrum (MAPS) ( |Datta, Choudhury & Bha radwaj 2007 t- 
measurable directly from observed radio data and dependent 
on the bias model. Assuming a standard cosmological model 
and a known dark matter power spectrum we have used the 
Principal Component Analysis (PCA) on simulated MAPS data 
for a hypothetical radio-interferometric experiment to put con- 
straints on the bias model. The method is similar to the 
one used for power spectrum estimation using the CMB 
data ( lEfstathiou & Bond 19991 |^& Holder 2003; ILeach 2006t 
and constraining reionization ( Mitra, Choudhury & Ferrara 20111 



|Mitra, Choudhury & Ferrara 2012) . Stringent constraints on the 
bias function with future data sets would be crucial in modelling 
the distribution of neutral gas at low redshifts and justify the use 
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of HI as a tracer of the underlying dark matter field. This would be 
useful for both analytical and numerical work involving the post- 
reionization HI distribution. 

The paper is organized as follows - in the next section we 
discuss the simulation of HI distribution and the general features 
of the bias function. Following that, we discuss the HI multi- 
frequency angular power spectrum (MAPS), a statistical quanti- 
fier directly measurable from radio-interferometric experiments. 
Finally we use the principal component analysis to investigate the 
possibility of constraining the bias model with simulated MAPS 
( |Datta, Choudhury & Bharadwaj 2007) data. 



2 SIMULATION RESULTS - THE BIAS MODEL 

We have obtained the dark matter distribution using the PM N- 
body code developed by Bharadwaj & Srikant (2004) , assuming a 
fiducial cosmological model (used throughout the paper) Q.m = 
0.2726, = 0.726, fif, = 0.0456, h = 0.705, T^rub = 2.72SK 
as = 0.809, Us = 0.96 (all parameters from W MAP 7 year- 
data <Komatsu & et al. 201 1[ IJarosik & et al. 201 It ). We simulate 
608'^ particles in 1216^ grids with grid spacing 0.1 Mpc in a 
121.6 Mpc^ box. The mass assigned to each dark matter parti- 
cle is mpart = 2.12 X lO^Af0/i~^. The initial particle distribu- 
tion and velocity field generated using Zel'dovich approximation 
(at z ~ 25) are evolved only under gravity. The particle posi- 
tion and velocities are then obtained as output at different red- 
shifts 1.5 < 2 < 4 at intervals of Sz = 0.5. We have used the 
Friends-of-Friends algorithm (Davis et al. 1985t to identify dark 
matter over-densities as halos, taking linking length b — 0.2 (in 
units of mean inter-particle distance). This gives a reasonably good 
match with the theoretical halo mass function dJenkins et al. 200 Tl 
ISheth & Tormen 2002t for masses as small as = lOmpart. The 
halo mass function obtained from simulation is found to be in ex- 
cellent agreement with the Sheth-Tormen mass function in the mass 
range lO" < M < 10^^ H'^Mq. 

We follow the prescription of ]Bagla, Khandai & Datta (2010) , 
to populate the halos with neutral hydrogen and thereby identify 
them as DLAs. Equation (3) of |Bagla, Khandai & Datta (2010) re- 
lates the virial mass of halos, M with its circular velocity Vdrc- 
The neutral gas in halos can self shield itself from ionizing radi- 
ation only if the circular velocity is above a threshold of Vdrc = 
30km/sec at z 3. This sets a lower cutoff for the halo mass 
A/min. Further, halos are populated with gas in a way, such that the 
very massive halos do not contain any HI. An upper cut-off scale 
to halo mass Mmax is chosen using Ucirc = 200km/sec, above 
which we do not assign any HI to halos. This is consistent with 
the observation that very massive halos do not contain any gas in 
neutral form dPontzen et al. 20()8 ). The total neutral gas is then dis- 
tributed such that the mass of the gas assigned is proportional to 
the mass of the halo between these two cut-off limits. We note that 
there is nothing canonical about this scheme. However, with the 
basic physical picture in the background this is the simplest model. 
Results obtained using alternative HI assignment schemes are not 
expected to be drastically different ( [Bagla, Khandai & Datta 2010) . 

Figure [T] shows the simulated power spectra of dark matter 
and HI distribution at a fiducial redshift z — 2.5. The dark matter 
power spectrum is seen to be consistent with the transfer function 
given by |EisensTe in & Hu (1998) and the scale invariant primordial 
power spectrum ( iHarrison 19701 [Zeldovich 1972) . The HI power 
spectrum has a greater amplitude than its dark matter counterpart 
in the entire fc-range allowed by the simulation parameters. Fig- 
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Figure 1. The simulated power spectra for dark matter distribution (solid 
line) and the HI density field (dashed line) at redshift z = 2.5. 



ure|2] shows the behaviour of the bias function b{k, z) . We have 
obtained the scale dependence of the HI bias for various redshifts 
in the range 1.5 < z < 4. At these redshifts, the bias is seen to 
be greater than unity, a feature that is observed in the clustering of 
high redshift galaxies dMo & White 1996l[^^he & Brown 2009l l. 
On large cosmological scales the bias remains constant and grows 
monotonically at small scales, where non-linear effects are at play. 
This is a generic feature seen at all redshifts. The fc-range over 
which the bias function remains scale independent is larger at the 
lower redshifts. The linear bias model is hence seen to hold reason- 
ably well on large scales. The scale dependence of bias for a given 
redshift is fitted using a cubic polynomial with parameters summa- 
rized in Table [T] The inset in Figure |2] shows the redshift depen- 
dence of the linear bias which indicates a monotonic increase. This 
is also consistent with the expected z-dependence of high redshift 
galaxy bias. The behaviour of the linear bias for small fc-values as 
a function of redshift is non-linear and can be fitted by an approxi- 
mate power law of the form ~ z^. This scaling relationship of bias 
with is found to be sensitive to the mass resolution of the simu- 
lation. The similar dependence of HI bias with k and z has been 
observed earlier by Bagla, Khandai & Datta (2010[ l with a compu- 
tationally robust Tree N-body code. Here we show that, the same 
generic features and similar scaling relations for bias can be ob- 
tained by using a simpler and computationally less expensive PM 
N-body code. Our aim is to use this scale and redshift dependence 
of bias, obtained from our simulation as the fiducial model for the 
post reionization HI distribution. We shall subsequently investigate 
the feasibility of constraining this model using Principal Compo- 
nent Analysis (PCA) on simulated MAPS data. 



3 HI 21-CM ANGULAR POWER SPECTRUM - 
SIMULATED DATA 

Redshifted 21 -cm observations have an unique advantage over 
other cosmological probes since it maps the 3D density field and 
gives a tomographic image of the Universe. In this paper we 
have quantified the statistical properties of the fluctuations in the 
redshifted 21-cm brightness temperature T{ii,z) on the sky is 
quantified through the multi frequency angular power spectrum 




Figure 2. The simulated bias function for z =1.5, 2.0, 2.5, 3.0, 3.5 and 4.0 
(bottom to top) showing the scale dependence. The inset shows the variation 
of the large-scale linear bias as a function of redshift. 
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Table 1. The fit parameters for bias function of the form fe^(fc) = c^k^ 
C2k'^ + cifc + Co for various redshifts 1.5 < z < 4.0. 



MAPS, defined as Ci{Az) = l^atm{z)a*^^{z + Az)), where 
o.im{z) = / dQfiYl^{h)T{n, z). This measures the correlation 
of the spherical harmonic components of the temperature field at 
two redshift slices separated by Az. In the flat-sky approxima- 
tion and incorporating the redshift space distortion effect we have 
l |Datta, Choudhury & Bharadwaj 2007| l 



dk\^ cos(/i;|| Ar)PHi(k) 



(1) 



for correlation between HI at comoving distances r and r + Ar, 

0.02 



r 



Phi denotes the redshift space HI power spectrum given by 

2 



+ fcn and 



PM^) = ^mb\k,z)Dl 



1 + /? 



P{k) 



(2) 



where the mean neutral fraction xm is assumed to have a fiducial 
value 2.45 x 10~^ P = f /b{k, z), f = din D+ /dlna where, 
D+ represents the growing mode of density perturbations, a is the 
cosmological scale factor and P{k) denotes the present day matter 
power spectrum. 

We use MAPS as an alternative to the more commonly used 
3D power spectrum since it has a few features that makes its mea- 
surement more convenient. Firstly we note that as a function of 
£ (angular scales) and Az (radial separations) the MAPS encap- 
sulate the entire three dimensional information regarding the HI 
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distribution. In this approach, the fluctuations in the transverse di- 
rection are Fourier transformed, while the radial direction is kept 
unchanged in the real frequency space. No cosmological informa- 
tion is however lost. Secondly, 21 -cm signal is deeply submerged 
in astrophysical foregrounds. These foregrounds are known to have 
a smooth and slow variation with frequency, whereas the signal 
is more localized along the frequency axis. The distinct spectral 
( ) behaviour has been proposed to be an useful method to sep- 
arate the cosmological signal from foreground contaminants. In- 
fact it has been shown that foregrounds can be completely removed 
by subtracting out a suitable polynomial in /S.v from C£(A!^) 
( IGhosh et al. 201lV It is hence advantageous to use MAPS which 
maintains the difference between the frequency and angular infor- 
mation in an observation. The 3D power spectrum on the contrary 
mixes up frequency and transverse information through the full 
3D Fourier transform. Further, for a large band width radio ob- 
servation, covering large radial separations light cone effect is ex- 
pected to affect the signal. This can also be easily incorporated into 
MAPS unlike the 3D power spectrum which mixes up the infor- 
mation from different time slices. The key advantage, however, in 
using the angular power spectrum is that it can be obtained directly 
from radio data. The quantity of interest in radio-interferometric 
experiments is the complex Visibility V(U, v) measured for a pair 
of antennas separated by a distance d as a function of baseline 
U = d/A and frequency v. The method of Visibility correla- 
tion to estimate the angular power spectrum has been well estab- 
lished ( ,Bharadwaj & Sethi 2001 ; Bharadwaj & Ali 2005[ l. This fol- 
lows from the fact that {V{\J ,v)V* {\J ,v + Au)) oc Ct[Au). 
Here the angular multipole I is identified with the baseline U as 
£ = 27rf7 and one has assumed that the antenna primary beam is ei- 
ther de-convolved or is sufficiently peaked so that it maybe treated 
as a Dirac delta function. Further the constant of proportionality 
takes care of the units and depends on the various telescope param- 
eters. 

The angular power spectrum at a multipole I is obtained by 
projecting the 3D power spectrum. The integral in Equation[T] sums 
over the modes whose projection on the plane of the sky is i/r. 
Hence, Ct has contributions from matter power spectrum only for 
k > £/r. The shape of Ce is dictated by the matter power spec- 
trum P{k) and the bias b(k). The amplitude depends on quanti- 
ties dependent on the background cosmological model as well as 
the astrophysical properties of the IGM. We emphasize here that, 
the mean neutral fraction and the HI bias are the only two non- 
cosmological parameters in our model for the HI distribution at 
low redshifts. Predicting the nature of Ct in a given cosmological 
paradigm is then crucially dependent on the underlying bias model 
and the value of the neutral fraction. 

The dependance of the MAPS Ci^Av) measures the cor- 
relation between the various 2D modes as a function of radial sep- 
aration Ar (Ai^). The signal is seen to decorrelate for large radial 
separations, the decorrelation being faster for larger I values. For 
a given I, one gets independent estimates of Ce for radial separa- 
tions greater than the correlation length. Projection of the 3D power 
spectrum leads the availability of fewer Fourier modes. However, 
for a given band width B, one may combine the signals emanat- 
ing from epechs separated by the correlation length /S.uc in the 
radial direction. Noting that the amplitude of the signal does not 
change significantly over the radial separation corresponding to 
the band width, one has ~ B / independent measurements of 
Ci[A.z — 0). We have adopted the simplified picture where the 
noise in Ce{Az = 0) gets reduced owing to the combination of 
these B/Ai/c realizations. A more complete analysis would incor- 



porate the correlation for Au < Avc- We plan to take this up in a 
future work. 

Figure[3] shows the 3D HI power spectrum at the fiducial red- 
shift z = 2.5 obtained using the dark matter power spectrum of 
|Eisenstein & Hu (1998| >. We have used the WMAP 7 year cosmo- 
logical model throughout. Figure|4]shows the corresponding HI an- 
gular power spectrum. The shape of Ci is dictated by the shape of 
the matter power spectrum, the bias function, and the background 
cosmological model. The amplitude is set by various quantities that 
depend on the cosmological model and the growth of linear pertur- 
bations. The global mean neutral fraction also appears in the am- 
plitude and plays a crucial role in determining the mean level for 
21 -cm emission. Hence, for a fixed cosmological model, the bias 
and the neutral fraction, solely determine the fluctuations of the 
post-reionization HI density field. We have used the bias model ob- 
tained from numerical simulations in the last section to evaluate 
the Ci. We assume that the binned angular power spectrum is mea- 
sured at seven £ bins — the data being generated using Equation [T] 
using the fiducial bias model. 

The noise estimates are presented using the formalism 
used by [Mao et al. (2008) for the 3D power spectrum and 
[Bharadwaj & Ali (2005 ) and [Bagla, Khandai & Datta (20 101 for 
the angular power spectrum. We have used hypothetical telescope 
parameters for these estimates. We consider radio telescope with 
60 GMRT like antennae (diameter 45 m) distributed randomly over 
a region 1km x 1km. We assume Tays ~ lOOA'. We consider a 
a radio-observation at frequency v = 405MHz with a bandwidth 
B = 32MHz for an observation time of 1000 hrs. 

In order to attain desired sensitivities we have assumed that the 
data is binned whereby several nearby £— modes are combined to 
incresase the SNR. Furthur, in the radial direction, the signal is as- 
sumed to decorrelate for Av > 0.5MHz, so that we have 64 inde- 
pendent measurements of Ci for the given band width of 32MHz. 
The bins chosen here allows the binned power spectrum to be 
measured at a SNR > 4 in the entire range 400 < t < 8000. 
One would ideally expect to measure the power spectrum at a large 
number of i values which would necessarily compromise the ob- 
tained sensitivities. With the given set of observational parameters, 
one may, in principle choose a finer binning. It shall however de- 
grade the SNR below the level of detectability. Choosing arbitrarily 
fine i— bins and simulataneously maintaining the same SNR would 
require improved observational parameters which may be unrea- 
sonable if not impossible. The same reasoning applies to noise es- 
timation for the 3D power spectrum where for a given set of ob- 
servational parameters, the choice of k— bins is dictated by the 
requirement of sensitivity. In the figure [3] showing the 3D power 
spectrum a 4 — cr detection of Phi (fc) in the central bin requires the 
full k— range to be divided into 18 equal logarithmic bins for the 
same observational parameters. 

The noise in Cg and PHi(fc) is dominated by cosmic variance 
at small ijk (large scales), whereas, instrumental noise dominates 
at large £/fc values (small scales). We point out that the error esti- 
mates predicted for a hypothetical observation are based on reason- 
able telescope parameters and future observations are expected to 
reflect similar sensitivities. 



We note here that several crucial observational difficulties hin- 
der Ci to be measured at a high SNR. Separating the astrophysical 
foregrounds, which are several order larger in magnitude than the 



signal is a major challenge (jSantos, Cooray & Knox 2005 
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Ghosh etal. 20101 IGhosh et a 


.2011). Several methods have 



been suggested for the removal of foregrounds most of which 
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Figure 3. The theoretical 3D HI power spectrum P^j(fc) for z = 2.5 as 
a function of fc, at ^ = 0.5. The points with 2-cr en'or-bars represent the 
hypothetical binned data. 



Figure 4. The theoretical angular power spectrum for z = 2.5 as a 
function of a I. The points with 2-a eiTor-bars represent the hypothetical 
data. 



uses the distinct spectral property of the 21 cm signal as against 
that of the foreground contaminants. The multi frequency angular 
power spectrum (MAPS) C^(Ai/) is itself useful for this pur- 
pose dGhosh et al. 20101 IGhosh et al. 2011b . Whereas this signal 
Ci[l^v) decorrelates over large Ai/, the foregrounds remain 
correlated — a feature that maybe used to separate the two. In 
our subsequent discussions we assume that the foregrounds have 
been removed. As mentioned earlier, the angular power spectrum 
can directly be measured from raw visibility data. One requires to 
incorporate the primary beam of the antenna in establishing this 
connection ( [Bharadwaj & Ali 2005[ l. In this paper we assume that 
such difficulties are overcome and the angular power spectrum is 
measured with sufficiently high SNR. 

In the next section we use the Ci data generated with these 
assumptions to perform the PCA. If the 3D HI power spectrum 
is measured at some (fc, /i) it would be possible to determine the 
bias directly from a knowledge of the dark matter power spectrum. 
The bias would be measured at the fc— values where the data is 
available. The results for the 3D analysis is summarized in section 

121 



4 PRINCIPAL COMPONENT ANALYSIS 

In this section, we discuss the principal component method towards 
constraining the bias function using Ci data. We consider a set of 
riobs observational data points labeled by Ci^^^ where lobs runs 
over the different I values for which Ci is obtained (Fig.|4}. 

In our attempt to reconstruct b{k) in the range [fcmin, fcmax], 
we assume that the bias which is an unknown function of fc, can 
be represented by a set of nbin discrete free parameters bi — b(ki) 
where the entire k-range is binned such that ki corresponds to the 
z*'" bin of width given by 

max 

Infc 

mm 



We have chosen Ubin = 61 and a fc— range 0.13 < k < 5.3 
Mpc^^. Our choice is dictated by the fact that for k < 0.13 
Mpc~^, the Ce corresponding to the smallest £ is insensitive to 
b{k) and for fe > 5 Mpc~^ there is no data probing those scales. 
This truncation is also justified as the Fisher information matrix, 
we shall see, tends to zero beyond this fc— range. 
The Fisher matrix is constructed as 



Fi-] 



del 



del 



(4) 



where Cl^^^ is the theoretical (Eq[Tll Cg evaluated a.t£ — lobs using 
the fiducial bias model b^'^{k) and (Tf^^, is the corresponding ob- 
servational error. The data is assumed to be such that the covariance 
matrix is diagonal whereby only the variance oi^^^ suffices. 

The fiducial model for bias is, in principle, expected to be 
close to the underlying "true" model. In this work we have taken 
fe*"^(fc) to be the fitted polynomial obtained in the earlier section 
which matches the simulated bias up to an acceptable accuracy. 

In the model for HI distribution at low redshifts, the mean neu- 
tral fraction crucially sets the amplitude for the power spectrum. 
However, a lack of precise knowledge about this quantity makes 
the overall amplitude of Ci largely uncertain. To incorporate this 
we have treated the quantity Shi as an additional free parameter 
over which the Fisher matrix is marginalized. The corresponding 
degraded Fisher matrix is given by 



(5) 



A In fci = 



nbij 



1 



(3) 



where F is the original nbin x Jibin Fisher matrix corresponding to 
the parameters 6i, F' is a 1 x 1 Fisher matrix for the additional pa- 
rameter xhi, and B is a nbin x 1-dimensional matrix containing the 
cross-terms. We shall henceforth refer to F'*'^® as the Fisher matrix 
and implicitly assume that the marginalization has been performed. 

The Fisher matrix obtained using Eq|4]and Eq[5]is illustrated 
in Figure |5] as a shaded plot in the fc — fc plane. The matrix shows 
a band diagonal structure with most of the information accumu- 
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Figure 5. The degraded Fisher matrix Pf^^ in the k — k plane. 



lated in discrete regions especially corresponding to the fc— modes 
for which the data is available. In the region k > 2 and k < 0.2 
Mpc~^, the value of Ftj is relatively small, implying that one can- 
not constrain b{k) in those fc— bins from the data set we have con- 
sidered in this work. 

A suitable choice of basis ensures that the parameters are 
not correlated. This amounts to writing the Fisher matrix in its 
eigen basis. Once the Fisher matrix is constructed, we determine 
its eigenvalues and corresponding eigenvectors. The orthonormal- 
ity and completeness of the eigenfunctions, allows us to expand the 
deviation of b{ki) from its fiducial model, Sbi = b{ki) — b^'^{ki), 
as 



Sbi = ^ mpSp(ki 



(6) 



where Sp{ki) are the principal components of b{ki) and nip are the 
suitable expansion coefficients. The advantage is that, unlike b{ki), 
the coefficients nip are uncorrelated. 

Figure[6]shows the inverse of the largest eigenvalues. Beyond 
the first six, all the eigenvalues are seen to be negligibly small. It 
is known that the largest eigenvalue corresponds to minimum vari- 
ance set by the Cramer-Rao bound and vice versa. This implies 
that the errors in b{k) would increase drastically if modes i > 6 
are included. Hence, most of the relevant information is essentially 
contained in the first six modes with larger eigenvalues. These nor- 
malized eigenmodes are shown in the Figure [T] One can see that, 
all these modes almost tend to vanish for k > 2 and k < 0.2 
Mpc~^, as the Fisher matrix is vanishingly small in these regions. 
The positions of the spikes and troughs in these modes are related 
to the presence of data points and their amplitudes depend on the 
corresponding error-bars (smaller the error, larger the amplitude). 

The fiducial model adopted in our analysis may be different 
from the true model which dictates the data. Clearly, the recon- 



struction would be poor for wide discrepancies between the two. In 
our analysis, the simulated bias serves as the input. In the absence 
of many alternative models for large scale HI bias, this serves as a 
reasonable fiducial model. 

We assume that one can then reconstruct the function Sbi us- 
ing only the first A4 < Wbin modes (see Eq. [SJ. Considering all 
the ribin modes ensures that no information is thrown away. How- 
ever this is achieved at the cost that errors in the recovered quan- 
tities would be very large owing to the presence of the negligi- 
bly small eigenvalues. On the contrary, lowering the number of 
modes can reduce the error but may introduce large biases in the 
recovered quantities. An important step in this analysis is there- 
fore, to decide on the number of modes AI to be used. In order to 
test this we consider a constant bias model to represent the true 
model as against the fiducial model. For a given data, figure [8] 
shows how the true model is reconstructed through the inclusion 
of more and more PCA modes. The reconstruction is directly re- 
lated to the quality of the data. In the fc-range where data is not 
available, the reconstruction is poor and the fiducial model is fol- 
lowed. The reconstruction is also poor for large departures of the 
true model from the fiducial model. We see that a resonable recon- 
struction is obtained using the first 5 modes for fc < 1 where the 
data is available. In order to fix the value of M, we have used the 
Akaike information criterion dLiddle 2007t AIC = Xmin + 2M, 
whose smaller values are assumed to imply a more favored model. 
Following the strategy used by ,Clarkson & Zunckel (20I0i l and 
[Mitra, Choudhury & Ferrara (2012^ , we have used different values 
of AI (2 to 6) for which the AIC is close to its minimum and amal- 
gamated them equally at the Monte Carlo stage when we compute 
the errors. In this way, we ensure that the inherent bias which exists 
in any particular choice of M is reduced. 

We next perform the Monte-Carlo Markov Chain (MCMC) 
analysis over the parameter space of the optimum number of PCA 
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Figure 6. The inverse of eigenvalues of the degraded Fisher matrix Pf^^ 
which essentially measures the variance on the corresponding coefficient. 
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Figure 7. The first 6 eigenmodes of the degraded Fisher matrix. 



amplitudes {nip} and xhi- Other cosmological parameters are held 
fixed to the WMAP7 best-fit values (see Section |2}. We carry out 
the analysis by taking M = 2 to M = 6 for which the AIC crite- 
rion is satisfied. By equal choice of weights for M and folding the 
corresponding errors together we reconstruct b{k) and thereby Ci 
along with their effective errors. We have developed a code based 
on the publicly available COSMOMC JLewis & Bridle 2002b for 
this purpose. A number of distinct chains are run until the Gelman 
and Rubin convergence statistics satisfies R~l < 0.001. We have 
also used the convergence diagnostic of Raftery & Lewis to choose 



Figure 8. The fiducial and constant (true) bias models are shown. The re- 
construction of the true model is shown for cases where number of PCA 
modes considered are M = 3,5,7 
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Figure 9. The marginalized posteriori distribution of the binned bias func- 
tion obtained from the MCMC analysis using the AIC criterion up to first 6 
PCA eigenmodes. The solid fines shows the mean values of bias parameters 
while the shaded regions represent the 2-a confidence Umits. In addition, 
we show the fiducial and constant bias models. 



suitable thinning conditions for each chain to obtain statistically 
independent samples. 



5 RESULTS AND DISCUSSION 

The reconstructed bias function obtained using the analysis de- 
scribed in the last section is shown in Figure |9] The solid line 
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Figure 10. The reconstructed Ci with its 2-a confidence Hmits. The points 
with eiTor-bars denote the observational data. The solid, short-dashed and 
long-dashed lines represent Ci for the mean, fiducial and constant bias 
models respectively. 
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Figure 11. The reconstructed PHi(fc) with its 2-a confidence hmits. The 
points with error-bars denote the observational data. We have taken = 0.5 
and z = 2.5 



represents the mean model while the shaded region corresponds 
to 95% confidence limits (2-cr). We have also shown the fiducial 
model (short-dashed) as well as the popularly used constant bias, 
b ~ 1.5 model (long-dashed) for comparison. We find that the 
fiducial model is within the 95% confidence limits for the entire 
A;— range considered, while the constant bias is within the same 
confidence limits only up to A: ~ 2 Mpc~^. We note that the er- 



Parameters 


2-(T errors 




1.06 X 10-3 




0.453 



Table 2. The 2-cr errors for xjji and bi\^{k ■ 
the cun'ent analysis using AIC criterion. 



: 0.3 Mpc ^) obtained from 



rors decrease drastically for k > 2 and k < 0.2 Mpc^^. This 
is expected from the nature of the Fisher matrix which shows that 
there is practically no information in the PCA modes from these 
fc— regions. Therefore, all models show a tendency to converge to- 
wards the fiducial one. This is a direct manifestation of lack of data 
points probing these scales. Thus, most of the information is con- 
centrated in the range 0.2 < fc < 2 Mpc"^ within which recon- 
struction of the bias function is relevant with the given data set. 

The mean reconstructed bias simply follows the fiducial model 
for 0.2 < k < 2 Mpc"^. This is expected as the simulated d data 
is generated using the fiducial bias model itself (Section|3). In the 
case of analysis using real observed data this matching would have 
statistical significance, whereas here this just serves as an internal 
consistency check. The shaded region depicting the errors around 
the mean is however meaningful and tells us how well the given 
data can constrain the bias. The outline of the 2-a confidence lim- 
its shows a jagged feature which is directly related to the presence 
of the data points. We observe that apart from the fiducial model, a 
constant bias model is also consistent with the data within the 2-a 
limits. In fact, other than imposing rough bounds 1 < b{k) < 2, 
the present data can hardly constrain the scale-dependence of bias. 
It is also not possible for the C'l data with its error-bars to statisti- 
cally distinguish between the fiducial and the constant bias model 
in 0.2 < k < 2 Mpc~^. Figure [Tol illustrates the recovered an- 
gular power spectrum with its 95% confidence limits. Superposed 
on it are the original data points with error-bars. We also show the 
angular power spectrum calculated for the fiducial and the constant 
bias models. The 2-a contour follows the pattern of the error-bars 
on the data points. It is evident that the data is largely insensi- 
tive (within its error-bars) to the different bias models. Hence the 
fc— dependence of bias on these scales does not affect the observ- 
able quantity d within the bounds of statistical precision. 

While constructing the Fisher matrix, we had marginalized 
over the largely unknown parameter xm - Treating it as an indepen- 
dent free parameter, we have investigated the possibility of con- 
straining the neutral fraction using the simulated Ce data. The 2- 
a error in this parameter obtained from our analysis is shown in 
Table |2] We had used the fiducial value xhi = 2.45 x 10~^ in 
calculating C'e. It is not surprising that our analysis gives a mean 
Xm = 2.44 X 10~^ which is in excellent agreement with the fidu- 
cial value. It is however more important to note that the given data 
actually constrains xm reasonable well at ^ 4%. 

Noting that, on large scales (fc < 0.3 Mpc"^), one cannot dis- 
tinguish between the mean, fiducial and the constant bias models, 
we use 6iin(= 1.496) to denote the bias value on these scales. The 
2-a error on bun is evaluated at fc = 0.3 Mpc"^ (shown in Table 
0. 

In the fc— range of our interest, the fiducial model does not re- 
flect significant departure from the constant bias. Further, the con- 
fidence interval obtained from the data also reflects that the ob- 
served Ce is insensitive to the form of bias function b{k) in this 
range - provided that it is bound between approximate cut-offs 
(1 < fe(fc) < 2). Moreover, the bias largely affects the amplitude of 
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the angular power spectrum and has only a weak contribution to- 
wards determining its shape. A scale independent large-scale bias 
seems to be sufficient in modelling the data. The mean neutral frac- 
tion which globally sets the amplitude of the power spectrum is 
hence weakly degenerate with the bias. This is manifested in the 
fact that though Shi is rather well constrained, the bias reconstruc- 
tion which uses the degraded Fisher information (after marginal- 
izing over xm) is only weakly constrained from the same data. 
A prior independent knowledge about the post reionization neutral 
fraction would clearly ensure a more statistically significant bias 
reconstruction with smaller errors. 

Figure [TT] shows the reconstructed 3D HI power spectrum. 
The direct algebraic relationship between the observable Pm{k) 
and the bias b(k) makes the 3D analysis relatively straightforward. 
This is specifically evident since the Fisher matrix elements in this 
case are non-zero only along the diagonal at specific k~ values 
corresponding to the data points. The entire routine repeated here 
yields similar generic features. However, the key difference is that 
we have a larger number of bins with high sensitivity leading to 
an improved constraining of bias 1.3 < b{k) < 1.7 in the range 
0.2 < fc < 0.7Mpc"\ 

In the absence of real observed data, our proposed method ap- 
plied on a simulated data set, reflects the possibility of constrain- 
ing large-scale HI bias. The method is expected to yield better re- 
sults if one has precise knowledge about the neutral content of the 
IGM and the underlying cosmological paradigm. We note that the 
problem of constraining an unknown function given a known data 
dealt in this work is fairly general and several alternative meth- 
ods maybe used. The chief advantage of the method adopted here, 
apart from its effective data reduction, is its model independence. 
The non-parametric nature of the analysis is specially useful in the 
absence of any specific prior information. A straightforward fitting 
of a polynomial and estimating the coeffecients may turn out to 
be effective but there is no a priori reason to believe that it would 
work. It is logically more reasonable not to impose a model (with 
its parameters) upon the data, and instead, let the data reconstruct 
the model. 

With the anticipation of upcoming radio observations towards 
measurement of HI power spectrum, our method holds the promise 
for pinning down the nature of HI bias thereby throwing valuable 
light on our understanding of the HI distribution in the diffuse IGM. 
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